Fit Gamma factor

This notebook estimates the gamma factor from a set of 5 μs-ALEX smFRET measurements.

What this notebook does?

According to Lee 2005 (PDF, SI PDF), we estimate the $\gamma$-factor from Proximity Ratio (PR) and S values (with background, leakage and direct excitation correction) for a set of 5 μs-ALEX measurements.

The PR and S values are computed by the template notebook:

which is executed for each sample using usALEX-Batch.

From Lee 2005 (equation 20), the following linear relation holds:

$$\frac{1}{S} = \Omega + \Sigma \cdot E_{PR}$$

Once $\Omega$ and $\Sigma$ are fitted, we can compute the $\gamma$-factor as (equation 22):

$$\gamma = (\Omega-1)/(\Omega + \Sigma-1)$$

Import libraries


In [3]:
from __future__ import division
import numpy as np
import pandas as pd
import lmfit
from scipy.stats import linregress

Computation

This notebook read data from the file:


In [6]:
data_file = 'results/usALEX-5samples-PR-leakage-dir-ex-all-ph.txt'

In [7]:
data = pd.read_csv(data_file, sep="\s+").set_index('sample')
data


Out[7]:
n_bursts_all n_bursts_do n_bursts_fret E_pr_fret E_pr_fret_sig S_pr_fret S_pr_fret_sig E_pr_do_kde nt_mean E_pr_fret_kde
sample
7d 1169 582 544 0.926413 0.002498 0.553918 0.004235 0.0014 22.265796 0.9302
12d 1304 329 945 0.731053 0.003017 0.565995 0.003207 0.0164 21.905752 0.7432
17d 2483 462 1958 0.427139 0.002573 0.543997 0.002362 0.0126 21.094356 0.4318
22d 2051 319 1670 0.183379 0.001969 0.549327 0.002511 0.0008 22.987313 0.1796
27d 787 160 586 0.083907 0.002890 0.562717 0.004169 -0.0068 16.941794 0.0844

In [8]:
data[['E_pr_fret', 'E_pr_fret_kde']]


Out[8]:
E_pr_fret E_pr_fret_kde
sample
7d 0.926413 0.9302
12d 0.731053 0.7432
17d 0.427139 0.4318
22d 0.183379 0.1796
27d 0.083907 0.0844

In [9]:
res = linregress(data.E_pr_fret_kde, 1/data.S_pr_fret)
slope, intercept, r_val, p_val, stderr = res

For more info see scipy.stats.linearregress.


In [10]:
Sigma = slope 
Sigma


Out[10]:
-0.009993289402355424

In [11]:
Omega = intercept
Omega


Out[11]:
1.8063091259676307

In [10]:
r_val


Out[10]:
0.030997465223121211

In [11]:
r_val**2


Out[11]:
0.00096084285025860891

P-value (to test the null hypothesis that the slope is zero):


In [12]:
p_val


Out[12]:
0.96053912269852137

Gamma computed from the previous fitted values:


In [12]:
gamma = (Omega - 1)/(Omega + Sigma - 1)
gamma


Out[12]:
1.012549404323615

In [13]:
with open('results/usALEX - gamma factor - all-ph.txt', 'w') as f:
    f.write(str(gamma))

Corrected FRET values

With the knoledge of $\gamma$ we can compute the corrected FRET efficiencies.

Let define the function to apply gamma correction (from burstlib.py):


In [14]:
def gamma_correct_E(Er, gamma):
    """Apply gamma correction to the uncorrected FRET `Er`."""
    Er = np.asarray(Er)
    return Er/(gamma-gamma*Er+Er)

The corrected values are:


In [15]:
E_alex = gamma_correct_E(data.E_pr_fret, gamma)
E_alex


Out[15]:
array([ 0.92456139,  0.73109077,  0.42336969,  0.18263484,  0.07914672])

In [16]:
data.E_pr_fret


Out[16]:
sample
7d        0.924344
12d       0.730478
17d       0.422609
22d       0.182170
27d       0.078920
Name: E_pr_fret, dtype: float64

Non-linear least squares, using Lmfit

With a non-linear least-squares minimization we are not limited to fit a linear relation.

Let's rewrite the relation without inverting $S_R$:

$$S_R = \frac{1}{\Omega + \Sigma \cdot E_{PR}}$$

In [17]:
import lmfit
from lmfit import minimize, Parameters, report_fit, report_ci, report_errors

In [18]:
def residual(params, Er, Sr, eps_data=None):
    slope = params['slope'].value
    intercept = params['intercept'].value

    model = 1./(intercept + slope*Er)
    if eps_data is None:
        eps_data = np.ones_like(Er)
    return (Sr - model)/eps_data

params = Parameters()
params.add('intercept', value=2)
params.add('slope', value=0.)

out = minimize(residual, params, args=(data.E_pr_fret, data.S_pr_fret))

In [19]:
out.values


Out[19]:
{'intercept': 1.7406596630381004, 'slope': 0.0030796355446684408}

In [20]:
report_fit(params)


[[Variables]]
     intercept:     1.74066 +/- 0.02472581 (1.42%) initial =  2
     slope:         0.003079636 +/- 0.0436766 (1418.24%) initial =  0
[[Correlations]] (unreported correlations are <  0.100)

In [21]:
Omega = out.values['intercept']
Sigma = out.values['slope']

gamma = (Omega - 1)/(Omega + Sigma - 1)
gamma


Out[21]:
0.99585925397442776

We can compute the confidence intervals (method details) and derive the effect on $\gamma$:


In [22]:
ci = lmfit.conf_interval(out)
report_ci(ci)


/Users/anto/anaconda/lib/python2.7/site-packages/lmfit/confidence.py:289: UserWarning: Warning, maxiter=200 reachedand prob(slope=-0.197920364455) = 0.980791305555 < max(sigmas).
  warn(errmsg)
/Users/anto/anaconda/lib/python2.7/site-packages/lmfit/confidence.py:289: UserWarning: Warning, maxiter=200 reachedand prob(slope=0.204079635545) = 0.98002107331 < max(sigmas).
  warn(errmsg)
             99.70%    95.00%    67.40%     0.00%    67.40%    95.00%    99.70%
    slope      -inf  -0.13579  -0.04813   0.03940   0.05448   0.14337       inf
intercept   1.53797   1.66423   1.71197   1.74066   1.77002   1.82203   1.98211

Let's compute gamma in the 95% confidence interval for the 2 parameters using a simple brute-force method:


In [23]:
gamma_list = []
for sig_err in np.linspace(ci['slope'][1][1], ci['slope'][5][1], 20):
    for omega_err in  np.linspace(ci['intercept'][1][1], ci['intercept'][5][1], 20):
        Omega = out.values['intercept'] + omega_err
        Sigma = out.values['slope'] + sig_err

        gamma = (Omega - 1)/(Omega + Sigma - 1)
        gamma_list.append(gamma)

In [24]:
np.min(gamma_list), np.max(gamma_list)


Out[24]:
(0.94259831075324385, 1.0584055044039404)

NOTE: Using the 95% confidence interval for the 2 fit parameters we observe a variation on gamma ~ 12%. This only considering the empirical error of the 5 data points respect to the model. Moreover, the underlying assumption is that the errors are equally distributed for all the data points. So a more realistic estimation could be worst than that.

Just for test we can fit the same linear relation we fitted initially with linregress:


In [25]:
def residual2(params, x, data, eps_data=None):
    slope = params['slope'].value
    intercept = params['intercept'].value

    model = (intercept + slope*x)
    if eps_data is None:
        eps_data = np.ones_like(data)
    return (1./data-model)/eps_data

params2 = Parameters()
params2.add('intercept', value=2)
params2.add('slope', value=0.)

out2 = minimize(residual2, params2, args=(data.E_pr_fret, data.S_pr_fret))

In [26]:
ci = lmfit.conf_interval(out2)
report_ci(ci)


/Users/anto/anaconda/lib/python2.7/site-packages/lmfit/confidence.py:289: UserWarning: Warning, maxiter=200 reachedand prob(slope=-0.198576396493) = 0.9808242408 < max(sigmas).
  warn(errmsg)
/Users/anto/anaconda/lib/python2.7/site-packages/lmfit/confidence.py:289: UserWarning: Warning, maxiter=200 reachedand prob(slope=0.203423603507) = 0.980824277424 < max(sigmas).
  warn(errmsg)
             99.70%    95.00%    67.40%     0.00%    67.40%    95.00%    99.70%
    slope      -inf  -0.13627  -0.04863   0.03839   0.05348   0.14112       inf
intercept   1.52166   1.66272   1.71237   1.74130   1.77024   1.81989   1.96096

In [27]:
Omega = out2.values['intercept']
Sigma = out2.values['slope']

gamma = (Omega - 1)/(Omega + Sigma - 1)
gamma


Out[27]:
0.9967412671491358

Alternative method

By definition the stoichiometry is independent from E:

$$ S = \frac{\gamma F_D + F_A}{\gamma F_D + F_A + F_{AA}}$$

Neglecting $\gamma$ we obtain a raw stoichiometry (analogous of the proxhimity ratio):

$$ S_R = \frac{F_D + F_A}{F_D + F_A + F_{AA}}$$

Changes of $S_R$ as a function of $E$ indicates $\gamma \ne 1$.

We can fit $\gamma$ by minimizing the variance of $S$. However, experimentally, we have $E_R$ and $S_R$ and not $F_D$, $F_A$ and $F_{AA}$. We can express $S_R$ as a function of $E_R$:

$$ S_R = \frac{1}{1 + \frac{F_{AA}}{F_D + F_A}} = \frac{1}{1 + E_R\frac{F_{AA}}{F_A}} = \frac{1}{1 + E_R k_2}$$

where $k_2 = F_{AA}/F_A$. $k_2$ as a function of $E_R$ and $S_R$ is:

$$ k_2 = \left(\frac{1}{S_R} - 1\right)/E_R$$

Using this last definition, we can write $S$ as:

$$S = \frac{1}{1 + E k_2} = \frac{1}{1 + g(E_R, \gamma) k_2} = f(\gamma, E_R, S_R)$$

where $g(\cdot)$ function that links $E$ and $E_R$. To find $\gamma$ we can minimize the variance of $S$ (i.e. $f(\cdot)$).

Let start defining $g(\cdot)$ and $k_2(\cdot)$:


In [28]:
def correct_E(Er, gamma):
    Er = np.asanyarray(Er)
    return Er/(gamma-gamma*Er+Er)

def k2(Er, Sr):
    return (1./Sr - 1)/Er

A direct brute-force search gives:


In [29]:
gamma_range = np.arange(0.9, 1.1, 0.00002)
y_std = np.zeros_like(gamma_range)
for i, gamma in enumerate(gamma_range):
    y = 1/(1 + correct_E(data.E_pr_fret, gamma)*k2(data.E_pr_fret, data.S_pr_fret))
    y_std[i] = y.std()
    #plot(y)
gamma_range[y_std == y_std.min()]


Out[29]:
array([ 0.99696])

Expressing the fit as a minimization problem gives exactly the same result:


In [30]:
def residual(params, Er, Sr):
    gamma = params['gamma'].value
    S = 1/(1+correct_E(Er, gamma)*k2(Er, Sr))
    errors = S - S.mean()
    return errors

params = Parameters()
params.add('gamma', value=1.0)

out = minimize(residual, params, args=(data.E_pr_fret, data.S_pr_fret))

In [31]:
out.values['gamma']


Out[31]:
0.99696873118951224

In [32]:
report_fit(params)


[[Variables]]
     gamma:     0.9969687 +/- 0.05088592 (5.10%) initial =  1
[[Correlations]] (unreported correlations are <  0.100)

Here we obtained that the standard deviation of the fitted $\gamma$ is around 5%, consistent with the previous estimations.

Fit plot


In [33]:
%matplotlib inline
from matplotlib.pyplot import *

In [34]:
plot(data.E_pr_fret, 1/data.S_pr_fret, 's')
x = np.arange(0, 1, 0.01)
plot(x, intercept + slope*x, 'k')
ylim(1, 2)
xlabel('E (uncorrected)')
ylabel('1/S (uncorrected)')


Out[34]:
<matplotlib.text.Text at 0x108af1b50>

In [30]:


In [30]:


In [30]: